Summary

This open source dashboard presents the uptake of hybrid open access for 4,970 subscription journals from 68 publishers. Since 2013, these journals have published 7,390,888 articles, of which 250,087 were made openly available without delay, representing a hybrid open access share of 3.4%.

Hybrid open access journals are included when they share the following two characteristics:

  1. Academic institutions sponsored immediate open access publication of individual articles according to the Open APC initiative,
  2. The publisher provided open content license statements via Crossref including information about whether articles were made openly available directly upon publication or not.

By bringing together openly available datasets about hybrid open access into one easy-to-use tool, this dashboard demonstrates how existing pieces of an evolving and freely available data infrastructure for monitoring scholarly publishing can be re-used to gain a better understanding of hybrid open access publishing. It, thus, contributes to recent calls from the Open Access 2020 Initiative and the cOAlition S with its Plan S aiming at a data-driven debate about how to transition subscription-based journal publishing to open access.

This document gives information about the analytical design, as well as how to use the dashboard. Because this open source dashboard was built around already existing infrastructure services for scholarly publishing, discussion will also include guidance about how publishers can properly report hybrid open access journal articles to Crossref in accordance with evolving standards like the ESAC Guidelines for Transformative Agreements.

Background and motivation

Many publishers offer hybrid open access journals (Suber 2012). However, because of non-standardized reporting practices, it is hard to keep track of how many articles these journals provided in open access, and to what extent these figures relate to the overall article volume (Björk 2017; Laakso and Björk 2016). In particular, determining subscription-based journals that already did publish open access articles immediately, as well as obtaining licensing information about access and re-use rights is challenging (Laakso and Björk 2016; Piwowar et al. 2018). Another question around analyzing hybrid open access is how to distinguish it from delayed open access, a publishing model, which a considerable number of journals practise as well (Laakso and Björk 2013). In the delayed model, some or all articles of an issue are made openly available to all readers after an embargo period.

There are varying attempts to investigate the uptake of hybrid open access. While earlier bibliometric studies examined reports from publishers (Björk 2017) or obtained articles manually from publisher websites (Laakso and Björk 2016), Unpaywall has recently become a widely used service to discover open access articles including hybrid open acccess (Piwowar et al. 2018, @Else_2018). Integrated in major bibliometric databases like the Web of Science, Unpaywall links DOIs from Crossref, a DOI registration agency for scholarly works, to free full-texts. Several bibliometric studies and monitoring services investigated the prevalence of hybrid open access based on Unpaywall data (e.g. Bosman and Kramer (2018), Robinson-Garcia, Costas, and Leeuwen (2019) or the German Open Access Monitor). Following Piwowar et al. (2018), these approaches determined hybrid open access articles as publisher provided open access not being published in fully open access journals listed in the Directory of Open Access Journals (DOAJ). Moreover, articles must be published under an open content license to be characterized as hybrid open access, an information, which Unpaywall retrieved from Crossref and publisher websites.

Despite the considerable efforts to extend the evidence base around open access, Unpaywall’s approach to determine hybrid open access is limited in two respects. First, there are some fully open access journals, which are not indexed in the DOAJ, presumably because these journals did not comply with its comprehensive inclusion criteria. In a large-scale study, Crawford (2017) identified 7,743 non-DOAJ indexed fully open access journals that published at least one article beween 2012 and mid 2016. Second, Unpaywall focuses on current open access provision, but does not track when an article was made openly available, which, in turn, makes it hard to distinguish between immediate and delayed open access provision using this data source alone.1 Consequently, Piwowar et al. (2018), in contrast to most definitions and earlier studies (Laakso and Björk 2016), described hybrid open access articles tagged by Unpaywall as being “not necessarily immediately available (i.e., they may only be freely available after an embargo)”. For these reasons, facilitating bibliometric studies about hybrid open access remains challenging.

This above-described lack of standardized methods and publicly available data about hybrid open access publishing limits not only its quantitative study, but also informed policy-making around open access (Laakso 2019). Particularly, the business model of hybrid open access journals is disputed, because publishers often charge publication fees, also known as article processing charges (APC), to provide immediate open access to individual articles in addition to subscriptions (Suber 2012). Although it was initially envisioned that with growing funding opportunities for publication fees publishers would progressively transition hybrid open access journals to fully open access (Prosser 2003), it remains unclear to which extent the increasing willingness to pay for open access contributed to it, and how cost-effective these spendings were (Björk and Solomon 2014; Pinfield, Salter, and Bath 2017).

Funders and libraries have responded to the problems of missing evidence around hybrid open access publishing in the last years. To make expenditures more transparent, a growing number of institutions have started to disclose individual articles they supported as open data. The Wellcome Trust, the Austrian Science Fund FWF and British universities were among the first who shared their spendings for hybrid open access articles (Kiley 2014; Reckling and Kenzian 2014; Lawson 2015). The Open APC Initiative collects and standardizes these openly available spending data together with crowd-sourced expenditures. Because Crossref indexes most articles where institutions sponsored publication fees, Open APC can use its metadata services to make open access expenditures comparable at the level of institutions, publishers and journals (Jahn and Tullney 2016; Pieper and Broschinski 2018). So far, the Open APC Initiative disclosed 73,292 hybrid open access articles supported by 305 research performing organisations and funders between 2013 - 2019.

Additionally, funders and libraries have developed compliance criteria including machine-readable Creative Commons license statements to improve the discoverability of open access content. The Sponsoring Consortium for Open Access Publishing in Particle Physics – SCOAP\(^3\), for example, requires CC-BY licenses. It also archives the full-text of funded articles in several formats in a dedicated repository. In Europe, the Wellcome Trust refers to the life science repository Europe PubMed Central (Europe PMC) for depositing funded articles along with a comprehensive set of metadata. Moreover, the funder automatically checks if authors and publishers comply with these obligations (Kiley 2015). In its contract with the publisher Wiley, the German DEAL consortium stated comprehensive metadata obligations to be implemented using Crossref and its metadata profile (Sander et al. 2019). Likewise, the German Deutsche Forschungsgemeinschaft (DFG) referred to Crossref as metadata service for hybrid open access articles facilitated through its funding programme “Open Access Transition Agreements”. In the US, Chorus, a non-profit serving more than 50 publishers, analyzes Crossref metadata and assesses open access compliance for dedicated funders using interactive dashboards.

In the light of a perceived slow and ineffective growth of open access, more and more funders and libraries alter their spending on subscription-based journals and individual open access articles published in these outlets. Notably, the Open Access 2020 Initiative calls for a transparent approach to re-allocate budgets currently spent for subscriptions to open access business models. Likewise, the cOAlition S, an international network of research funders, released its widely discussed Plan S in September 2018. Starting from 2021, members of the cOAlition S intend to discontinue funding of publication fees for individual open access articles in subscription-based journals. However, hybrid open access articles published under a transformative agreement, in which spendings for subscription and open access publication are considered together, will remain eligible for funding. Nonetheless, the Plan S requires a commitment to full open access transition from the publishers. Journals providing delayed open access are not compatible with the Plan S.

In its Plan S implementation planning the cOAlition S refers to the ESAC Initiative, which aims at standardizing open access workflows between publishers and research institutions, to observe the transition process of journals under transformative agreements. In 2017, ESAC, based at the German Max Planck Digital Library (MPDL), released guidance about how to implement transformative agreements. According to these guidelines, publishers have to ensure that only corresponding authors affiliated with the agreement institution are eligible for open access support. Furthermore, publishers need to provide comprehensive metadata to Crossref including open content license information. They are also obliged to report sponsored articles to the agreement institutions (Geschuhn and Stone 2017). Complementary to spendings for individual hybrid open access articles, some library consortia like the Swedish Bibsam or the British Jisc have begun to make these reports openly available with the Open APC Initiative (Lundén, Smith, and Wideberg 2018; Pieper and Broschinski 2018). To support the monitoring of transformative agreements at the publisher-level, ESAC has started an agreement registry targeting national consortia to disclose their contracts.

These developments show that open and standardized reporting is critical to assess hybrid open access publishing. Against this background, this work presents a new methodology for investigating the uptake of hybrid open access, taking into account journals with the following two characteristics:

  1. Academic institutions sponsored immediate open access publication of individual articles according to the Open APC initiative,
  2. The publisher provided open content license statements via Crossref including information about whether articles were made openly available directly upon publication or not.

To my knowledge, this is the first approach that combines openly available data about the productivity of and the spending for hybrid open access journals. Reflecting recent compliance efforts from funders and libraries, it extends existing methods to find hybrid open access articles in two ways: First, it addresses the difficulties to identify hybrid open access journals, which already did publish individual open access articles by using spending information. Second, this approach uses Crossref metadata to exclude delayed open access provision that does not comply with the Plan S.

The aim of this work is to address the following questions that are relevant to researchers and practitioners alike:

  • How many hybrid open access journals and publishers have provided immediate open access to individual articles during 2013-2019?
  • What has been the yearly relative and absolute hybrid open access uptake?
  • To what extent is spending information for hybrid open access articles publicly available?
  • Do hybrid open access uptake levels vary by publisher, journal and country?

Data and methods

Methods follow the Wickham-Grolemund approach to practice data science (Wickham and Grolemund 2017). After importing, cleaning (“tidying”) and transforming data from various sources, a process called “data wrangling”, summary statistics were calculated and visualized to understand and communicate the uptake of hybrid open access publishing. For the latter, we created a dashboard, which allows visual interaction with our data. The workflow, illustrated in Figure 1 and described more detailed in this section, was implemented in R using open data and tools, making it transparent and re-usable.

This project, which is under active developement, started in November 2017, and is updated on a regular basis. Data used in the current version were gathered on 2020-01-07. Along with the data, methods are shared in the source code repository of this project, which is hosted on GitHub.

Summary of data and methods used, following the Wickham-Grolemund approach to practice data science (Wickham and Grolemund 2017)

Data Analysis

To reflect the challenge of finding hybrid open access journals with published open access articles (Laakso and Björk 2016), I started with a sample of hybrid open access journals from the Open APC initiative. This open data initiative crowd-sources information about spending on open access journal articles from various international research organisations. Its openly available dataset differentiates expenditure for articles published in hybrid and in fully open access journals. It also has a dedicated dataset containing information about articles, which were made openly available as part of transformative agreements, deals between publishers and large research organisations or consortia aiming at transitioning subscription-based licensing to open access business models. Using data from the Open APC initiative, thus, ensured that only hybrid open access journals with at least one institutionally funded open access article were examined.

After obtaining a list of hybrid open access journals from the Open APC initiative, Crossref’s REST API was queried to discover open access articles published in these journals, as well as to retrieve yearly article volumes for the period 2013 - 2019. Using the rcrossref client (Chamberlain et al. 2019), developed and maintained by the rOpenSci initiative, the first API call retrieved all license URLs available per journal using all ISSN variants available in the Open APC dataset. To control developments of the publishing market resulting in name changes of publishers or journal titles over time, only the most frequent facet field name was used. I also drew on facet field counts to obtain the yearly article volume per journal from Crossref. After matching license URLs indicating open access articles with the help of the dissem.in / oaDOI access indicator list, a second API call checked licensing metadata per journal. Here, I excluded delayed open access articles by using the Crossref’s REST API filters license.url and license.delay for each year in the period of 2013 - 2019. Next, URLs representing licenses were normalized to Creative Commons or publisher specific open content license statements. Because journal business models can change from hybrid to fully open access over time, the Directory of Open Access Journals (DOAJ), a curated list of fully open access journals, was finally checked to exclude these journals by ISSNs from the date of the open access flip onwards. To improve this matching, DOAJ data was enriched with further ISSN variants from Rimmert et al. (2017). Journals with an open access proportion above 0.95 over two years were also excluded. Because journals can also change from fully open access to a subscription-based business model (Matthias, Jahn, and Laakso 2019b), a comprehensive dataset of reverse flip journals (Matthias, Jahn, and Laakso 2019a) was finally used to identify such journals in our sample, and to exclude annual volumes published before the conversion from it.

Using Unpaywall data, I was able to determine additional articles with open content license statements, and compared them with our journal sample. After importing the Unpaywall dump to a Google Big Query instance, I searched all articles where license statements were available for the version of record with the bigrquery interface (Wickham 2018)2. Only articles from 2013 onwards in non-DOAJ indexed journals were considered. Subsequentially, I retrieved all ISSN variants for the corresponding journals in the Unpaywall sample and matched them with our hybrid open access journal sample. Finally, we compared articles contained in our sample with that of Unpaywall using DOIs. Because DOI names are case insensitive, DOIs were converted to lowercase before the matching.

Information about corresponding authors play a crucial role in open access funding (Schimmer, Geschuhn, and Vogler 2015; Geschuhn and Stone 2017). Because not all publishers share standardized affiliation data with Crossref, I text-mined our article sample for author email addresses, assuming that email domains roughly indicate the affiliation of the first respective corresponding author at the time of publication. First, I interfaced PubMed Central’s Open Access subset and, if the article was not available via PubMed Central, the Crossref TDM service using crminer from rOpenSci (Chamberlain 2018). From the so downloaded full-texts, I extracted the first email address occurrence per article using pattern matching. Next, I split the email domains in its parts with urltools (Keyes et al. 2018). To avoid misuse, particularly academic spamming, source code and data used for this text-mining exercise are currently stored in a privately git-repository until it is secured that no full author email addresses are accidentally shared.

After exploring the so obtained data, communication is the final step in the Wickham-Grolemund approach to practicing data science. Given the increasing demand for data-driven insights about the uptake of hybrid open access, I decided to present the results as an interactive dashboard, an efficient way to communicate results from a data analysis (Wickham and Grolemund 2017). To this end, I prototyped a shiny web application (Chang et al. 2019) that shows the development of hybrid open access publishing. More specifically, I used the flexdashboard package (Borges and Allaire 2017) that belongs to the R Markdown authoring framework for data science. This allowed coding the analytical web application within an R-based data analytics workflow (Allaire et al. 2019; Xie, Allaire, and Grolemund 2018), thereby shielding the data work from complex components written in other programming languages needed for interactive web applications.

Staying in the R-based data analytics workflow enabled re-using many R packages, which were also applied for data wrangling and exploration, particularly dplyr (Wickham et al. 2019) and tidyr (Wickham and Henry 2019). For data import and storage, I used the packages readr (Wickham, Hester, and Francois 2017) for csv and jsonlite (Ooms 2014) for json files. Furthermore, exploratory visualisations made with ggplot2, a domain specific language for statistical graphs (Wickham 2016), were converted to interactive plotly charts (Sievert 2018), an open-source charting library written in JavaScript. Additionally, the echarts4r package (Coene 2018), an interface to the javascript-based Echarts library, was used for presenting the distribution of email domains and their parts as hierarchical treemaps.

The dashboard application, written as R Markdown document, is hosted by shinyapps.io, and is accessible under the following link: https://subugoe.shinyapps.io/hybridoa. Along with the source code for the data analysis and the dashboard, datasets used are available via GitHub along with a brief description about the variables they contain: https://github.com/subugoe/hybrid_oa_dashboard/blob/master/data/README.md. Using Git, the history of the text-based datasets and source code was tracked, too.

Data Accuracy

Notice that my estimates about the extent of hybrid open access publishing are likely to be conservative. Probably not all authors and institutions shared information about hybrid open access spending with the Open APC initiative, because reporting is voluntary (Jahn and Tullney 2016). Despite its international focus, most hybrid open access spending information in Open APC originated from European institutions, in particular from the UK. In case of transformative agreements, it must be noted that the Open APC Initiative has a strong focus on articles from the Springer Compact scheme (Pieper and Broschinski 2018). However, there are complementary ways to implement open access funding that are not (yet) covered by the Open APC initiative, as the example of the Sponsoring Consortium for Open Access Publishing in Particle Physic – SCOAP\(^3\) demonstrates, which uses an open access repository to share sponsored articles.

Similarly, it is likely that not all publishers reported comprehensive metadata about access and re-use with Crossref, because the DOI registration agency did not mandate license statements for open access content at the time of our study. Piwowar et al. (2018) found that Crossref metadata lack some license coverage. Therefore, they investigated publisher websites for license statements as well. Furthermore, they observed that the percentage of open access provided by publishers, which was free to read, but where the journal was neither listed in the DOAJ nor provided licensing metadata, which they refer to as “bronze open access”, was the most common method to provide open access among publishers. Finally, the study estimated that the Crossref genre type “journal-article” did not necessarily refer to original article and reviews. Instead, around 7% were front-matters.

To assess data accuracy of this dashboard, we first investigated how many hybrid open journals covered by the Open APC initiative provided license statements to Crossref between 2013 - 2019. Next, we evaluated the accuracy of the automated retrieval using article random samples.3

Coverage accuracy

In the case of hybrid open access journals represented in the Open APC datasets, 68 publishers provided licensing statements via the Crossref API, representing 33 % of all publishers studied. At the journal-level, 84 % of all hybrid open access journal titles covered by the Open APC initiative are associated with open content license statements valid without delay in Crossref. Figure 2 provides a breakdown of licensing metadata coverage per publisher. It highlights that leading commercial publishing houses provided open content license statements to Crossref for most of their journals listed by the Open APC initiative including Springer Nature, Elsevier BV and Wiley. SAGE Publications and many smaller sized publishers, however, did not share license metadata to Crossref. Consequently, their journals were not included in our study.

Overview of Crossref licensing coverage per publisher. Yellow dots represent the number of hybrid open access journals disclosed by the Open APC initiative with licensing metadata, blue dots the overall number of hybrid open access journals in our sample.

Retrieval accuracy

51,677 out of 73,292 hybrid open access articles disclosed by the Open APC initiative provided open content license statements valid without delay, representing a percentage of 71 %. To assess the accuracy of our retrieval, we manually checked a random sample of 100 Open APC articles for which no license was found. We determined 93 articles that did not share license statements with Crossref using the license node. The other 7 articles did report an open content license, but with a delay (delay-in-days metadata field) above 0 days.4

Drawing another random sample of 100 articles, we manually validated if we obtained the correct license statements including start date from the Crossref API. We also evaluated if all journal articles were original articles or reviews. 94 articles were characterized as original article or review, confirming previous studies (Piwowar et al. 2018). Other document types were conference abstracts (N = 3), a medical guideline, a comment and a short report.

We were able to retrieve email addresses for 91 % of all articles in our dataset. Furthermore, recall and precision for obtaining and extracting email addresses from full-texts were investigated using a random sample of 200 articles. Recall asked if all author email addresses were found. Precision asked if the match of the first author email address was correct. While all first author email addresses were correctly parsed (precision = 1), 40 articles contained more than one email address from authors (recall = 0.8). In two cases, the corresponding author named two email addresses.

Results

Using data from Open APC and Crossref, we found 250,087 open access articles published without delay in 4,970 subscription-based journals from 68 publishers between 2013 - 2019. Overall, these journals published 7,390,888 articles, resulting in a hybrid open access share of 3.4 %.

In the following, this section presents the functionality of the interactive dashboard, and highlights key findings from an exploratory data analysis. The dashboard itself is divided into three webpages, which are accessible through the navigation bar. The first webpage, “Overview”, summarizes the development of hybrid open access publishing. The analysis can be subsetted by a publisher or a journal. The second webpage, “Compare”, lets you analyze the uptake of hybrid open access across publisher and years. The third webpage, “Institutional View”, is similar to the first webpage, but allows browsing by email domains instead of publishers and journals.

Overview: Longitudinal development of hybrid open access publishing

Launching the app shows up the key findings, which can be broken down by publisher or journal using the select boxes in the left sidebar. Publisher names are listed in decreasing order of the size of the article volume. Journals are filtered conditionally on the publisher selection and are sorted alphabetically.

Growth of hybrid open access

The upper part of the dashboard highlights the longitudinal development of hybrid open access publishing between 2013 - 2019. The first tab shows the relative uptake, while the second tab presents the hybrid open access article count on a yearly basis. Similar to the dashboard, Figure 3 illustrates the relative and absolute uptake since 2013. Bar charts are sub-grouped according to normalized CC license types. Overall, results indicate that the number and proportion of hybrid open access journal articles rose steadily from 2013 (10,153 articles, OA share: 1.1 %) to 2018 (55,146 articles, OA share: 4.8 %). CC-BY is the most prevalent open content license found. Around 65 % of open access articles were made available using this licence, followed by the less permissive license CC-BY-NC-ND, representing 25 % of the articles.

Uptake of hybrid open access license statements. Graph A shows the relative growth, B the absolute number of articles found since 2013.

Comparison with license information from Unpaywall

The third tab presents the article coverage of the dashboard, and compares it with the number of additional articles that were retrieved from Unpaywall for the same set of hybrid open access journals. Although Unpaywall determined hybrid open access articles using open content licenses from Crossref as well, the service furtermore searched publisher websites. In doing so, Unpaywall did not keep track when an article was made openly avaible. The extent of the discrepancies in this comparison indicates how comprehensively publishers reported license metadata to Crossref, and whether or not journals provided delayed open access along with hybrid open access. In total, 144,027 additional open access articles could be retrieved using Unpaywall.

Similar to the dashboard presentation, Figure 4 compares the development of hybrid open access according to our data (blue area) with additional articles with open access license statements from Unpaywall (gray area). The figure provides a yearly breakdown for the two largest publishers in our sample, Elsevier and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the residual category “Other”. The sharp decline of articles found by Unpaywall for Elsevier journals suggests that delayed open access was offered along with options to make individual articles open access upon publication. Indeed, Elsevier lists 130 journals with embargo periods between six and 48 months. A prominent example is the life-science journal Cell where all articles are made freely available after an embargo period of twelve months. While we found 248 hybrid open access articles published without delay, Unpaywall lists another 2,324 Cell articles with open content license, representing 52 % of the journal’s article volume that had been published since 2013. On the other hand, the number of additional Springer Nature articles obtained using Unpaywall’s license evidence is much lower, suggesting that delayed open access plays a smaller role in Springer Nature’s hybrid open access journal portfolio, and that license statements were shared to a large extent with Crossref.

Hybrid Open Access License Statements by publisher and source

Transparency of hybrid open access

In the lower part of the dashboard page “Overview”, the left-hand graph shows the extent to which information about open access funding was publicly available. On the right, an interactive chart presents the top-level and lower-level domain of the first or corresponding author’s email address, respectively.

Similarly to the left hand graph, Figure 5 presents the availability of spending information for open access articles, highlighting the two largest publishers Elsevier BV and Springer Nature. Open access articles from the remaining 66 publishers were reduced to the the residual category “Other”. Bars show the total number of hybrid open access articles per year and publisher. The bars are stacked according to the sources that disclosed the support of open access publication at the article-level (colored stacks), and where no such spending data was available for hybrid open access articles found (gray stacks labeled with NA).

The figure reveals large differences how funders and libraries enabled hybrid open access publications across publishers. While individual payments for publication fees dominated the sponsorship of open access publications in Elsevier journals, and that of many other publishers, supported Springer Nature hybrid open access articles mostly come from transformative agreements (TA). The figure also highlights that these agreements contributed to the growth of hybrid open access articles published in Springer Nature journals. Moreover, Figure 5 suggests that transformative agreements contribute to transparency: Whereas spending for 34 % of hybrid open access articles published in Springer Nature journals was disclosed with the Open APC initiative, the origin of expenditure for hybrid open access articles in Elsevier journals was available to a lesser extent (16 %). The largest match between publication and spending data, however, could be found for hybrid open access journals sponsored by the SCOAP\(^3\) consortium: the SCOAP\(^3\) repository tracked 94 % of hybrid open access articles published in SCOAP\(^3\) sponsored journals. It must be noted, however, that SCOAP\(^3\) supports high-energy physics content in these journals only. The remaining 6.3 % were therefore likely published on related topics. In total, spending information for 23 % of articles was found.

Development of spending disclosure for hybrid open Access articles across publishers. Sources include expenditures for individual articles (“Open APC (Hybrid)”) and articles from transformative agreements (“Open APC (TA)”) provided by the Open APC Initiative. Overlap between hybrid open access articles found via Crossref and the SCOAP\(^3\) repository is displayed as well. The light gray stack areas represent the number of articles where no information about open access sponsorship was available. Notice that it is very likely that the overall decrease of spending for hybrid open access reported to the Open APC initiative in 2019 is due to a lag between the time payments were made and reporting of payments to the initiative.

Email Domains from First / Corresponding Author

The interactive chart on the lower right of the first dashboard page presents email domains extracted from open access full-texts. These domains roughly indicate the affiliation of the first or of the corresponding authors, respectively, a data point used to delineate open access funding. In the dashboard, a hierarchical, interactive treemap visualizes the distribution of the email domains. Each top-level domain can be subdivided further into domain names representing academic institutions or companies. The size of each rectangle is proportional to the number of hybrid open access articles corresponding to this domain.

Figure 6 presents a breakdown by email domain suffix. In total, 228,433 email addresses were retrieved and parsed, representing 91 % of all articles in our dataset. 46,865 articles were associated with academic institutions in the UK (“ac.uk”), followed by domains from commercial organizations, mostly email providers like gmail.com or the Chinese 163.com and 126.com, and US-American institutions of higher education (“edu”). The figure illustrates that European institutions from the Netherlands (“nl”), Germany (“de”), Sweden (“se”), Austria (“ac.at”) and Poland (“edu.pl”) were among the top 10. In total, 451 top-level domains were retrieved.

Text-mined top-level domains of author email addresses from hybrid open access journal articles. Only the first email address per article was considered.

Compare: Variation in Hybrid Open Access Publishing

The second dashboard page, “Compare”, allows to explore how hybrid open access publishing varies across publishers. It focuses on market shares, and uptake levels of hybrid open access across journals and publishers. A table at the bottom of the webpage presents key indicators at the journal-level, which can be searched and downloaded as Excel spreadsheet.

Figure 7, which is also shown at the top of the “Overview” dashboard page, presents the top three publishers in terms of the number of hybrid open access articles published since 2013. Grey bars represent the total number of hybrid open access articles, colored bars the number of hybrid open access articles per publisher. These top three publishers – Elsevier BV, Springer Nature and Wiley – accounted for the largest proportion of open access articles (78 %), and that of hybrid open access journals (72 %).

Hybrid open access articles by publisher, shown as proportion of the total number of hybrid open access articles found. The colored bars show the number of articles per publisher, grey bars the overall distribution.

As shown in Figure 8, and in the mid-left panel in the dashboard’s “Compare” page, numbers and proportion of hybrid open access journal articles vary across publishers and journals. In the two-years period 2017-2018, for example, the mean open access proportion per Springer Nature journal was 12 % (SD = 10 %), whereas the mean open access proportion per journal published by Elsevier BV was 4.2 % (SD = 4.9 %). The publisher Wiley performed between these two: The mean proportion was 5.9 % (SD = 5.2 %).

Box plot characterizing spread and differences of the share of open access articles provided by subscription-based journal per publisher between 2017 and 2018 using five summary statistics (the median, the 25th and 75th percentiles, and 1.5 times the inter-quartile range between the first and third quartiles), and visualizing all outlying points individually. Axis displaying proportions was limited to around 30%.

Institutional view

The third dashboard page, “Institutional View”, provides a complementary view on the growth of hybrid open access. Instead of presenting the longitudinal development at the publisher and journal level, the data can be subsetted by double-clicking on the email domains shown in the treemap visualization. The top-level domain indicates the country and the sub-domain represent the name of the affiliation of the first or corresponding authors. The so selected data subset is then broken down by license statements, and by the availability of spending information. At the bottom, a table presents a breakdown by publisher per selection.

Figure 9 illustrates the uptake of hybrid open access, focusing on the public availability of spending information from academic institutions and consortia, an important measure to assess the transparency of open access funding. The figure highlights the prominent role hybrid open access publishing has among authors affiliated with British academic institutions; more than 20% of all hybrid open access articles were published from UK-based first or corresponding authors. According to spending data from the Open APC initiative, hybrid open access publication for articles from the UK were facilitated both through individual publication fees, also known as article-processing charges (APC), and transformative agreements.

The large overlap between publication data and spending information from the Open APC collection for transformative agreements can be explained by coordinated national activities to transition subscriptions. Together with the UK, Dutch, Swedish and Austrian library consortia participate in the Springer Compact license scheme. Under this agreement, members from participating institutions are subscribed to read around 1,800 journals. Corresponding authors affiliated with these institutions can also publish open access in them. These consortia actively fed data about supported articles into Open APC (Pieper and Broschinski 2018). Spending information from countries without an coordinated approach toward hybrid open acces publishing including transformative agreements, however, were not well represented in the spending data provided by the Open APC initiative. The sponsors of these open access publications were therefore unknow to us.

Availability of spending information for hybrid open access articles by top-level domain of email address from first respective corresponding authors

Discussion and conclusion

In the light of the ongoing debate on the transition to open access, this dashboard highlights a steady, but varying uptake of hybrid open access. Data collection takes advantage of an improved level of compliance with policy recommendations that have addressed open access discoverability and transparency of workflows in recent years. The findings suggest that both major publishing houses and transformative agreements have a large impact on hybrid open access uptake. However, open access funding for most hybrid open access articles remains unknown, which raises important questions about the transparency of this publishing model.

The presented longitudinal findings support earlier studies, confirming the growth of hybrid open access publishing. Björk (2012) observed a slow, but constant rise of hybrid open access. The author estimated 12,000 open access articles under this model in 2012. Laakso and Björk (2016) found that during the period of 2007–2013 both the number of journals with at least one open access article and the article volume increased. They determined 13,994 hybrid open access articles published in 2,714 different journals during 2013. During this year, around 40% of journals offering hybrid options did, in fact, publish open access articles. In a follow-up study, Björk (2017) noticed an accelerated growth of articles since 2014, whereby 20 major publishers published around 44,395 hybrid open access articles in 2016. According to this study, an increasing willingness to fund publication fees, in particular among Austrian and British research funders and universities, likely influenced this strong uptake. But also more and more national consortia and large research institutions accomplished to accommodate open access in their agreements with publishers.

When combining openly available bibliometrics data with spending data, these trends become apparent. UK-based corresponding authors were responsible for around every fifth hybrid open access article, presumably because of comprehensive funding activities in the course of the Finch report in 2012 (Pinfield, Salter, and Bath 2015). Austria, where the Austrian Science Fund (FWF) has financed hybrid open access publication fees since many years (Reckling and Kenzian 2014), is also well represented among the top countries publishing hybrid open access. Not surprisingly, publishers with a leading role in hybrid open access publishing are the ones that also dominate the subscription market. Moreover, transformative agreements have strongly driven growth in hybrid open access uptake since 2015. Most notably, Springer Nature, who already had been among the first with an aligned portfolio of hybrid open access journals, has offered the Springer Compact agreement to selected national consortia and large research organizations since then (Laakso, Solomon, and Björk 2016; Geschuhn and Stone 2017). Authors affiliated with countries where these agreements were in place, including the UK, the Netherlands, Sweden and Austria, ranked among the top in terms of published hybrid open access articles. Likewise, Springer Nature not only publishes the largest volume of hybrid open access journals and articles. The publisher also has an above-average uptake. In contrast, Elsevier BV, a publisher reported as being reluctant to transformative agreements (Else 2018b), performs at a lower level.

Despite the push to hybrid open access, particularly to transformative agreements, only a small proportion of articles, around 5 % in 2018, has become openly available under this model so far. Even though former studies observed higher uptake levels, probably because of methodological differences, these can still be described as low. Studying subscription-based journals with at least one identified hybrid open access article, Laakso and Björk (2016) determined an open access share of 3.8 % for the period of 2011–2013. Piwowar et al. (2018), estimated an even higher proportion according to which 9.4 % of articles published in 2015 were provided open access with license statements, with or without delay, by non-DOAJ indexed journals. Depending on the disciplinary, institutional and funding context, uptake levels can vary (Severin et al. 2018; Siler et al. 2018). However, the proportion of hybrid open access articles is likely of little use as an indicator to assess the feasibilitly of journal transition to fully open access. Rather, motivation for journal flipping comes from different actors and draws on a variety of strategies to sustain journal operation (Laakso, Solomon, and Björk 2016; Matthias, Jahn, and Laakso 2019b).

As discussed in the methods section, the scope of the dashboard is delimited to journals that follow best-practices to ensure the discoverability and transparency of hybrid open access publishing. Particularly, many smaller publishing houses that demonstrably published hybrid open access articles, do not share license statements with a start date through Crossref. It must also be noted that around 6% of publications presented as hybrid open access articles refer to other types of journal contributions that funding mechanisms for hybrid open access usually not cover, including conference abstracts or comments. Matching and cross-validating Crossref metadata dumps with bibliometric databases is a promising approach to better understand these metadata issues (Akbaritabar and Stahlschmidt 2019). Likewise, the Open APC Initiative partly covers open access funding information and appears to miss a considerable amount of articles

An obvious direction for future work is to extend the hybrid open access journal sample to journals obtained from publisher websites. Example for such bibliometric studies addressing hybrid open access are Laakso and Björk (2016) and Matthias, Jahn, and Laakso (2019b). Both studies, however, note that collecting data from publisher websites was very laborious, because of non-standardized spreadsheet formating. In the course of the Plan S consultation, the ESAC Initiative has launched another promising data source about hybrid open access. Its Registry of Transformative Agreements crowd-sources contract summaries from library consortia. So far, the registry presents details about 44 agreements. Some of those contracts yield information about eligible journals. At the article-level, it must be observed if and to what extent library consortia will disclose open access articles from transformative agreements in the near future. Additionally, the dashboard needs to address journal subjects and indexing in bibliometric databases as potential factors affecting hybrid open access uptake in future.

As this dashboard demonstrates, funders and publishers alike can improve the evidence base of hybrid open access publishing. Crossref provides excellent guidance for publishers to make licensing metadata available via the Crossref APIs: https://support.crossref.org/hc/en-us/articles/214572423-License-metadata-Access-Indicators. For being best represented in this dashboard, publishers will have to make sure to include license URL element license_ref and a start_date equal to the date of publication in the licensing metadata, which helps to identify open access journal content as well as to differentiate between immediate and delayed open access. Likewise, research institutions, funders, and libraries can increase transparency about hybrid open access publishing including transformative agreements by reporting funded articles to the Open APC initiative on a regular basis. Library consortia should also disclose transformative aggreements with the ESAC Registry of Transformative Agreements. These activities comply with best-practise recommendations, including the ESAC’s Guidelines for Transformative Agreements, and the Plan S implementation planning.

In the end, this open source dashboard demonstrates how existing pieces of an evolving and freely available data infrastructure for analyzing scholarly publishing can be re-used to gain a better understanding of hybrid open access. In future, an improved compliance will likely extend the evidence base about hybrid open access. This will allow for broader comparisons of high practical relevance addressing market concentration and the impact of transformative agreements as means to transition journals to full open access: Is the hybrid open access market dominated by a few countries and publishers? Or were they more likely to succeed in their attempts to put open and transparent workflows in place? What is the impact of hybrid open access publishing in general, and transformative agreements, in particular, on the transition of institutional as well as subject- and funder-specific publication practices? And do these activities contribute to journal flipping from a subscription-based model to fully open access?

Meta

How to contribute?

Dashboard and analysis have been developed in the open using open tools. There are a number of ways you can help make the this work better:

  • If you don’t understand something, please let us know and submit an issue.

  • Feel free to add new features or fix bugs by sending a pull request.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Author: Najko Jahn (Scholarly Communication Analyst, SUB Göttingen), 2017.

The R Markdown file, which includes the underlying source code for this document, is available here.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Bibliography

Akbaritabar, Aliakbar, and Stephan Stahlschmidt. 2019. “Applying Crossref and Unpaywall Information to Identify Gold, Hidden Gold, Hybrid and Delayed Open Access Publications in the KB Publication Corpus.” SocArXiv, May. https://doi.org/10.31235/osf.io/sdzft.

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2019. Rmarkdown: Dynamic Documents for R. https://rmarkdown.rstudio.com.

Björk, Bo-Christer. 2012. “The Hybrid Model for Open Access Publication of Scholarly Articles: A Failed Experiment?” Journal of the American Society for Information Science and Technology 63 (8): 1496–1504. https://doi.org/10.1002/asi.22709.

———. 2017. “Growth of Hybrid Open Access, 2009–2016.” PeerJ 5: e3878. https://doi.org/10.7717/peerj.3878.

Björk, Bo-Christer, and David Solomon. 2014. “How Research Funders Can Finance APCs in Full OA and Hybrid Journals.” Learned Publishing 27 (2): 93–103. https://doi.org/10.1087/20140203.

Borges, Barbara, and JJ Allaire. 2017. Flexdashboard: R Markdown Format for Flexible Dashboards. https://CRAN.R-project.org/package=flexdashboard.

Bosman, Jeroen, and Bianca Kramer. 2018. “Open Access Levels: A Quantitative Exploration Using Web of Science and oaDOI Data.” https://doi.org/10.7287/peerj.preprints.3520v1.

Chamberlain, Scott. 2018. Crminer: Fetch ’Scholary’ Full Text from ’Crossref’. https://CRAN.R-project.org/package=crminer.

Chamberlain, Scott, Hao Zhu, Najko Jahn, Carl Boettiger, and Karthik Ram. 2019. Rcrossref: Client for Various ’Crossref’ ’Apis’. https://CRAN.R-project.org/package=rcrossref.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2019. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Coene, John. 2018. Echarts4r: Create Interactive Graphs with ’Echarts Javascript’ Version 4. https://CRAN.R-project.org/package=echarts4r.

Crawford, Walt. 2017. Gray Open Access 2012–2016: Open Access Journals Beyond DOAJ. https://citesandinsights.info/civ17i1.pdf.

Else, Holly. 2018a. “How Unpaywall Is Transforming Open Science.” Nature 560 (7718): 290–91. https://doi.org/10.1038/d41586-018-05968-3.

———. 2018b. “Dutch Publishing Giant Cuts Off Researchers in Germany and Sweden.” Nature 559 (7715): 454–55. https://doi.org/10.1038/d41586-018-05754-1.

Geschuhn, Kai, and Graham Stone. 2017. “It’s the Workflows, Stupid! What Is Required to Make ‘Offsetting’ Work for the Open Access Transition.” Insights the UKSG Journal 30 (3): 103–14. https://doi.org/10.1629/uksg.391.

Jahn, Najko, and Anne Hobert. 2019. “Scholarly Communication Analytics: Open Access Evidence in Unpaywall.” https://subugoe.github.io/scholcomm_analytics/posts/unpaywall_evidence/.

Jahn, Najko, and Marco Tullney. 2016. “A Study of Institutional Spending on Open Access Publication Fees in Germany.” PeerJ 4: e2323. https://doi.org/10.7717/peerj.2323.

Keyes, Oliver, Jay Jacobs, Drew Schmidt, Mark Greenaway, Bob Rudis, Alex Pinto, Maryam Khezrzadeh, et al. 2018. Urltools: Vectorised Tools for Url Handling and Parsing. https://CRAN.R-project.org/package=urltools.

Kiley, Robert. 2014. “Wellcome Trust Apc Spend 2012-13: Data File.” Figshare. https://doi.org/10.6084/m9.figshare.963054.

———. 2015. “Wellcome Trust Open Access (Oa) Spend and Compliance Monitoring: 2013-14.” Figshare. https://doi.org/10.6084/m9.figshare.1321361.v5.

Laakso, Mikael. 2019. “Why We Need a Public Infrastructure for Data on Open Access.” Elephant in the Lab. https://doi.org/10.5281/zenodo.2540472.

Laakso, Mikael, and Bo-Christer Björk. 2013. “Delayed Open Access: An Overlooked High-Impact Category of Openly Available Scientific Literature.” Journal of the American Society for Information Science and Technology 64 (7): 1323–9. https://doi.org/10.1002/asi.22856.

———. 2016. “Hybrid Open Access–a Longitudinal Study.” Journal of Informetrics 10 (4): 919–32. https://doi.org/10.1016/j.joi.2016.08.002.

Laakso, Mikael, David Solomon, and Bo-Christer Björk. 2016. “How Subscription-Based Scholarly Journals Can Convert to Open Access: A Review of Approaches.” Learned Publishing 29 (4): 259–69. https://doi.org/10.1002/leap.1056.

Lawson, Stuart. 2015. “APC Data for 25 Uk Higher Education Institutions - 2014.” Figshare. https://doi.org/10.6084/m9.figshare.1305596.v5.

Lundén, Anna, Camilla Smith, and Britt-Marie Wideberg. 2018. “National Licence Negotiations Advancing the Open Access Transition a View from Sweden.” Insights the UKSG Journal 31. https://doi.org/10.1629/uksg.413.

Matthias, Lisa, Najko Jahn, and Mikael Laakso. 2019a. “Reverse Flip Open Access Journals.” https://doi.org/10.5281/zenodo.2553582.

———. 2019b. “The Two-Way Street of Open Access Journal Publishing: Flip It and Reverse It.” Publications 7 (2): 23. https://doi.org/10.3390/publications7020023.

Ooms, Jeroen. 2014. “The Jsonlite Package: A Practical and Consistent Mapping Between Json Data and R Objects.” arXiv:1403.2805 [stat.CO]. https://arxiv.org/abs/1403.2805.

Pieper, Dirk, and Christoph Broschinski. 2018. “OpenAPC: A Contribution to a Transparent and Reproducible Monitoring of Fee-Based Open Access Publishing Across Institutions and Nations.” Insights the UKSG Journal 31. https://doi.org/10.1629/uksg.439.

Pinfield, Stephen, Jennifer Salter, and Peter A. Bath. 2015. “The ’Total Cost of Publication’ in a Hybrid Open-Access Environment: Institutional Approaches to Funding Journal Article-Processing Charges in Combination with Subscriptions.” Journal of the Association for Information Science and Technology 67 (7): 1751–66. https://doi.org/10.1002/asi.23446.

———. 2017. “A Gold-Centric Implementation of Open Access: Hybrid Journals, the "Total Cost of Publication", and Policy Development in the UK and Beyond.” Journal of the Association for Information Science and Technology 68 (9): 2248–63. https://doi.org/10.1002/asi.23742.

Piwowar, Heather, Jason Priem, Vincent Larivière, Juan Pablo Alperin, Lisa Matthias, Bree Norlander, Ashley Farley, Jevin West, and Stefanie Haustein. 2018. “The State of OA: A Large-Scale Analysis of the Prevalence and Impact of Open Access Articles.” PeerJ 6: e4375. https://doi.org/10.7717/peerj.4375.

Prosser, David C. 2003. “From Here to There: A Proposed Mechanism for Transforming Journals from Closed to Open Access.” Learned Publishing 16 (3): 163–66. https://doi.org/10.1087/095315103322110923.

Reckling, Falk, and Margit Kenzian. 2014. “Austrian Science Fund (Fwf) Publication Cost Data 2013.” Figshare. https://doi.org/10.6084/m9.figshare.988754.v4.

Rimmert, Christine, Andre Bruns, Christopher Lenke, and Niels Christian Taubert. 2017. “ISSN-Matching of Gold OA Journals (ISSN-GOLD-OA) 2.0.” Bielefeld University. https://doi.org/10.4119/unibi/2913654.

Robinson-Garcia, Nicolas, Rodrigo Costas, and Thed N. van Leeuwen. 2019. “Indicators of Open Access for Universities.” In ISSI 2019 Conference. https://arxiv.org/abs/1906.03840.

Sander, Frank, Guido Hermann, Horst Hippler, Gerard Meijer, and Ralf Schimmer. 2019. “Projekt Deal – John Wiley &Amp; Son Publish and Read Agreement.” https://doi.org/10.17617/2.3027595.

Schimmer, Ralf, Kai Geschuhn, and Andreas Vogler. 2015. “Disrupting the subscription journals’business model for the necessary large-scale transformation to open access.” Max Planck Digital Library. https://doi.org/10.17617/1.3.

Severin, Anna, Matthias Egger, Martin Paul Eve, and Daniel Hürlimann. 2018. “Discipline-Specific Open Access Publishing Practices and Barriers to Change: An Evidence-Based Review.” F1000Research 7 (December): 1925. https://doi.org/10.12688/f1000research.17328.1.

Sievert, Carson. 2018. Plotly for R. https://plotly-book.cpsievert.me.

Siler, Kyle, Stefanie Haustein, Elise Smith, Vincent Larivi
‘ere, and Juan Pablo Alperin. 2018. “Authorial and Institutional Stratification in Open Access Publishing: The Case of Global Health Research.” PeerJ 6 (February): e4269. https://doi.org/10.7717/peerj.4269.

Suber, Peter. 2012. Open Access. MIT Press. https://mitpress.mit.edu/books/open-access.

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.

———. 2018. Bigrquery: An Interface to Google’s ’Bigquery’ ’Api’. https://CRAN.R-project.org/package=bigrquery.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc. https://r4ds.had.co.nz/.

Wickham, Hadley, and Lionel Henry. 2019. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2017. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.


  1. See also comments from Mark Patterson and, in a similar vein, Najko Jahn to the preprint version of Piwowar et al. (2018).↩︎

  2. See Jahn and Hobert (2019) for more details about how we use Unpaywall and Big Query in our data analytics work.↩︎

  3. I gratefully acknowledge that manual checking was performed by Alexandra Claases (hybrid open access availablity and parsing of Crossref metadata), and Cäcilia Schröer and Nick Haupka (email address retrieval https://github.com/naustica/praktikum_projekt/tree/master/aufgabe_1_recall_precision).↩︎

  4. For example, the article represented by https://api.crossref.org/works/10.1039/c7dt03848h did report a CC-BY license with a delay of 32.↩︎